lexical unit
Analysis and Visualization of Linguistic Structures in Large Language Models: Neural Representations of Verb-Particle Constructions in BERT
Kissane, Hassane, Schilling, Achim, Krauss, Patrick
This study investigates the internal representations of verb-particle combinations within transformer-based large language models (LLMs), specifically examining how these models capture lexical and syntactic nuances at different neural network layers. Employing the BERT architecture, we analyse the representational efficacy of its layers for various verb-particle constructions such as 'agree on', 'come back', and 'give up'. Our methodology includes a detailed dataset preparation from the British National Corpus, followed by extensive model training and output analysis through techniques like multi-dimensional scaling (MDS) and generalized discrimination value (GDV) calculations. Results show that BERT's middle layers most effectively capture syntactic structures, with significant variability in representational accuracy across different verb categories. These findings challenge the conventional uniformity assumed in neural network processing of linguistic elements and suggest a complex interplay between network architecture and linguistic representation. Our research contributes to a better understanding of how deep learning models comprehend and process language, offering insights into the potential and limitations of current neural approaches to linguistic analysis. This study not only advances our knowledge in computational linguistics but also prompts further research into optimizing neural architectures for enhanced linguistic precision.
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models
Lv, Qitan, Wang, Jie, Chen, Hanzhu, Li, Bin, Zhang, Yongdong, Wu, Feng
Generation of plausible but incorrect factual information, often termed hallucination, has attracted significant research interest. Retrieval-augmented language model (RALM) -- which enhances models with up-to-date knowledge -- emerges as a promising method to reduce hallucination. However, existing RALMs may instead exacerbate hallucination when retrieving lengthy contexts. To address this challenge, we propose COFT, a novel \textbf{CO}arse-to-\textbf{F}ine highligh\textbf{T}ing method to focus on different granularity-level key texts, thereby avoiding getting lost in lengthy contexts. Specifically, COFT consists of three components: \textit{recaller}, \textit{scorer}, and \textit{selector}. First, \textit{recaller} applies a knowledge graph to extract potential key entities in a given context. Second, \textit{scorer} measures the importance of each entity by calculating its contextual weight. Finally, \textit{selector} selects high contextual weight entities with a dynamic threshold algorithm and highlights the corresponding paragraphs, sentences, or words in a coarse-to-fine manner. Extensive experiments on the knowledge hallucination benchmark demonstrate the effectiveness of COFT, leading to a superior performance over $30\%$ in the F1 score metric. Moreover, COFT also exhibits remarkable versatility across various long-form tasks, such as reading comprehension and question answering.
Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory
Hicke, Rebecca M. M., Kristensen-McLachlan, Ross Deans
They appear extensively across all domains of natural language, from the most sophisticated poetry to seemingly dry academic prose. A significant body of research in the cognitive science of language argues for the existence of conceptual metaphors, the systematic structuring of one domain of experience in the language of another. Conceptual metaphors are not simply rhetorical flourishes but are crucial evidence of the role of analogical reasoning in human cognition. In this paper, we ask whether Large Language Models (LLMs) can accurately identify and explain the presence of such conceptual metaphors in natural language data. Using a novel prompting technique based on metaphor annotation guidelines, we demonstrate that LLMs are a promising tool for large-scale computational research on conceptual metaphors. Further, we show that LLMs are able to apply procedural guidelines designed for human annotators, displaying a surprising depth of linguistic knowledge.
Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs
Sun, Chenxi, Zhang, Hongzhi, Lin, Zijia, Zhang, Jingyuan, Zhang, Fuzheng, Wang, Zhongyuan, Chen, Bin, Song, Chengru, Zhang, Di, Gai, Kun, Xiong, Deyi
Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process, posing challenges for real-time applications. This paper introduces Lexical Unit Decoding (LUD), a novel decoding methodology implemented in a data-driven manner, accelerating the decoding process without sacrificing output quality. The core of our approach is the observation that a pre-trained language model can confidently predict multiple contiguous tokens, forming the basis for a \textit{lexical unit}, in which these contiguous tokens could be decoded in parallel. Extensive experiments validate that our method substantially reduces decoding time while maintaining generation quality, i.e., 33\% speed up on natural language generation with no quality loss, and 30\% speed up on code generation with a negligible quality loss of 3\%. Distinctively, LUD requires no auxiliary models and does not require changes to existing architectures. It can also be integrated with other decoding acceleration methods, thus achieving an even more pronounced inference efficiency boost. We posit that the foundational principles of LUD could define a new decoding paradigm for future language models, enhancing their applicability for a broader spectrum of applications. All codes are be publicly available at https://github.com/tjunlp-lab/Lexical-Unit-Decoding-LUD-. Keywords: Parallel Decoding, Lexical Unit Decoding, Large Language Model
Compressing Context to Enhance Inference Efficiency of Large Language Models
Li, Yucheng, Dong, Bo, Lin, Chenghua, Guerin, Frank
Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM's fixed context length. This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact. We test our approach using common data sources requiring long context processing: arXiv papers, news articles, and long conversations, on tasks of summarisation, question answering, and response generation. Experimental results show that Selective Context significantly reduces memory cost and decreases generation latency while maintaining comparable performance compared to that achieved when full context is used. Specifically, we achieve a 50\% reduction in context cost, resulting in a 36\% reduction in inference memory usage and a 32\% reduction in inference time, while observing only a minor drop of .023 in BERTscore and .038 in faithfulness on four downstream applications, indicating that our method strikes a good balance between efficiency and performance.
EventNet-ITA: Italian Frame Parsing for Events
This paper introduces EventNet-ITA, a large, multi-domain corpus annotated with event frames for Italian, and presents an efficient approach for multi-label Frame Parsing. The approach is then evaluated on the dataset. Covering a wide range of individual, social and historical phenomena, the main contribution of EventNet-ITA is to provide the research community with a resource for textual event mining and a novel and extensive tool for Frame Parsing in Italian.
Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering
Large language models (LLMs) have received significant attention by achieving remarkable performance across various tasks. However, their fixed context length poses challenges when processing long documents or maintaining extended conversations. This paper proposes a method called \textit{Selective Context} that employs self-information to filter out less informative content, thereby enhancing the efficiency of the fixed context length. We demonstrate the effectiveness of our approach on tasks of summarisation and question answering across different data sources, including academic papers, news articles, and conversation transcripts.
Open-source Frame Semantic Parsing
Frame semantic parsing (Gildea and Jurafsky, 2002) is a natural language understanding (NLU) task involving finding structured semantic frames and their arguments from natural language text as formalized by the FrameNet project (Baker et al., 1998). Frame semantics has proved useful in understanding user intent from text, finding use in modern voice assistants (Chen et al., 2019), dialog systems (Chen et al., 2013), and even text analysis (Zhao et al., 2023). A semantic frame in FrameNet describes an event, relation, or situation and its participants. When a frame occurs in a sentence, there is typically a "trigger" word in the sentence which is said to evoke the frame. In addition, a frame contains a list of arguments known as frame elements which describe the semantic roles that pertain to the frame. A sample sentence parsed for frame and frame elements is shown in Figure 1. FrameNet provides a list of lexical units (LUs) for each frame, which are word senses with may evoke the frame when they occur in a sentence. For instance, the frame "Attack" has lexical units "ambush.n",
That's All Folks: a KG of Values as Commonsense Social Norms and Behaviors
De Giorgis, Stefano, Gangemi, Aldo
Values, as intended in ethics, determine the shape and validity of moral and social norms, grounding our everyday individual and community behavior on commonsense knowledge. Formalising latent moral content in human interaction is an appealing perspective that would enable a deeper understanding of both social dynamics and individual cognitive and behavioral dimension. To tackle this problem, several theoretical frameworks offer different values models, and organize them into different taxonomies. The problem of the most used theories is that they adopt a cultural-independent perspective while many entities that are considered "values" are grounded in commonsense knowledge and expressed in everyday life interaction. We propose here two ontological modules, FOLK, an ontology for values intended in their broad sense, and That's All Folks, a module for lexical and factual folk value triggers, whose purpose is to complement the main theories, providing a method for identifying the values that are not contemplated by the major value theories, but which nonetheless play a key role in daily human interactions, and shape social structures, cultural biases, and personal beliefs. The resource is tested via performing automatic detection of values from text with a frame-based approach.
Leveraging a New Spanish Corpus for Multilingual and Crosslingual Metaphor Detection
Sanchez-Bayona, Elisa, Agerri, Rodrigo
The lack of wide coverage datasets annotated with everyday metaphorical expressions for languages other than English is striking. This means that most research on supervised metaphor detection has been published only for that language. In order to address this issue, this work presents the first corpus annotated with naturally occurring metaphors in Spanish large enough to develop systems to perform metaphor detection. The presented dataset, CoMeta, includes texts from various domains, namely, news, political discourse, Wikipedia and reviews. In order to label CoMeta, we apply the MIPVU method, the guidelines most commonly used to systematically annotate metaphor on real data. We use our newly created dataset to provide competitive baselines by fine-tuning several multilingual and monolingual state-of-the-art large language models. Furthermore, by leveraging the existing VUAM English data in addition to CoMeta, we present the, to the best of our knowledge, first cross-lingual experiments on supervised metaphor detection. Finally, we perform a detailed error analysis that explores the seemingly high transfer of everyday metaphor across these two languages and datasets.